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Abstract 

The Random K- Satisfiability Problem, consisting in verifying the existence of 
an assignment of N Boolean variables that satisfy a set of M = aN random 
logical clauses containing K variables each, is studied using the replica sym- 
metric framework of diluted disordered systems. We present an exact iterative 
scheme for the replica symmetric functional order parameter together for the 
different cases of interest K = 2, K > 3 and X» 1. The calculation of the 
number of solutions, which allowed us [Phys. Rev. Lett. 76, 3881 (1996)] to 
predict a first order jump at the threshold where the Boolean expressions be- 
come unsatisfiable with probability one, is thoroughly displayed. In the case 
K = 2, the (rigorously known) critical value (a = 1) of the number of clauses 
per Boolean variable is recovered while for K > 3 we show that the system 
exhibits a replica symmetry breaking transition. The annealed approximation 

is proven to be exact for large K. 
PACS Numbers : 05.20 - 64.60 - 87.10 
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I. INTRODUCTION 



The emergent collective behaviours observed in a variety of models of statistical mechan- 
ics and in particular in frustrated disordered systems, have been recognized to play a relevant 
role in apparently distant fields such as theoretical computer science, discrete mathematics 
and complex systems theory [1-5]. Computationally hard problems, characterized (in worst 
cases) by exponential running time scaling of their algorithms or memory requirements, the 
so called NP-complete problems [6], are known to be in one-to-one correspondence with 
the ground state properties of spin-glass like models (see [1] and references therein). As 
a consequence, tools and concepts of statistical physics have shed some new light onto the 
notion of the typical complexity of NP-complete problems and have lead to the definition of 
new search algorithms as the simulated annealing algorithm, based on the introduction of 
an artificial temperature and some cooling procedures [7]. 

Very recently, other techniques inspired from statistical mechanics, namely finite size 
scaling analysis, have been applied [8] also to the study of universal behaviour in the com- 
putational cost (running time) of some classes of algorithms in the course of searching for 
solutions of random realizations of the prototype of NP-complete problems, the satisfiability 
(SAT) problem we shall discuss. 

More generally, phase transition concepts are starting to play a relevant role in theoret- 
ical computer science [4], where the analysis of general search methods applied to various 
classes of hard computational problems, characterized by a large number of relevant vari- 
ables and generated according to some probability distributions, is of crucial importance in 
building a theory for the typical-case complexity. NP-complete decision problems which 
are computationally hard in the worst case appear not to be really so in the typical case, 
except in critical regions of their parameter space (with a polynomial-exponential pattern) 
where almost all instances of the problems become computationally hard to solve. Far from 
criticality, the problems are either under- or over-constrained and both the stochastic search 
procedures and the systematic ones are capable of finding solutions in polynomial times. 
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One of the major theoretical open questions in this context would be to understand how 
typical-case complexity theory of computer science and spin-glass transitions, the so-called 
replica symmetry breaking transition [1], are related. In turn, computer science is a source of 
highly non-trivial models containing all the paradigms necessary to a deeper understanding 
of the physical properties of disordered frustrated systems, in particular diluted models for 
which the theoretical framework is still to be completed [9,11-15]. 

Among the known NP-complete problems, the SAT problem is at the same time the 
root problem of complexity theory [6] and a prototype model for phase transition in random 
combinatorial structures [3,16]. SAT was the first problem proved to be NP-complete by 
S. Cook in 1971 [17] and opened the way to the identification of a vast family of other 
NP-complete problems for which a polynomial reduction to SAT became available [6]. In 
particular the K-satisfiability (K-SAT) problem, a version of SAT we shall discuss in great 
detail in what follows, beside playing a central role in NP-completeness proving procedures 
[6], is a widespread test for the evaluation of performance of combinatorial search algorithms, 
due the typical intractability of random samples generated near criticality. 

In a recent work [5], we have shown that the methods of statistical mechanics of random 
systems allow to compute some algorithmically relevant quantities such as the typical entropy 
of the problem, i.e. the typical number of its solutions, and to clarify the nature of the 
threshold behaviour. The scope of this paper is twofold. On the one hand, we aim at giving 
a thorough discussion of the analytical derivation of the above results, mainly the calculation 
of the entropy jump at the transition. On the other hand, we expose in detail the replica 
symmetric theory of the K-SAT problem by showing both how to go beyond the simplest 
solution proposed in our previous work [5] and by clarifying the connections with known 
results in statistical mechanics of diluted models. 

The paper is organized as follows. Section II is devoted to the presentation of the K-SAT 
problem and of the known exact results. Sections III contains an outline of the statistical 
mechanics approach whereas the replica symmetric solutions are exposed in Section IV. In 
the successive sections, from V to VIII, the outcomes of the analytical calculations are 
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exposed in detail for the different values of K of interest. In Section IX, we show how 
to rederive some of the previous results using a simple probabilistic approach. Finally, in 
Section X, some new perspectives opened by the introduction of a model which interpolates 
smoothly between 2-SAT and 3-SAT are briefly discussed. 

II. THE K-SAT PROBLEM AND A BRIEF SURVEY OF KNOWN RESULTS 

Given a set of N Boolean variables {xi = 0, l}i=i,...,jv, we first randomly choose K among 
the N possible indices i and then, for each of them, a literal Zi that is the corresponding %i 
or its negation xi with equal probabilities one half. A clause C is the logical OR of the K 
previously chosen literals, that is C will be true (or satisfied) if and only if at least one literal 
is true. Next, we repeat this process to obtain M independently chosen clauses {Ce}i=\ immmi M 
and ask for all of them to be true at the same time, i.e. we take the logical AND of the 
M clauses thus obtaining a Boolean expression in the so called Conjunctive Normal Form 
(CNF). The resulting K CNF formula F may be written as 



where A an d V stand for the logical AND and OR operations respectively. 

A logical assignment of the {a^j's satisfying all clauses, that is evaluating F to true, is 
called a solution of the K-satisfiability problem. If no such assignment exists, F is said to 
be unsatisfiable. 

When the number of clauses becomes of the same order as the number of variables 
(M = a N) and in the large N limit - indeed the case of interest also in the fields of 
computer science and artificial intelligence [16,18] - the K-SAT problem exhibits a striking 
threshold phenomena. Numerical experiments have shown that the probability of finding a 
correct Boolean assignment falls abruptly from one down to zero when a crosses a critical 
value a c (K) of the number of clauses per variable. Above a c (K), all clauses cannot be 
satisfied any longer and one gets interested in minimizing the number of unsatisfiable clauses, 




(1) 
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which is the optimization version of K-SAT also referred to as MAX-K-SAT. Moreover, near 
a c (K), heuristic search algorithms get stuck in non-optimal solutions and a slow down effect 
is observed (intractability concentration). On the contrary, far from criticality heuristic 
processes are typically rather efficient [8]. 

Very schematically, the known results on K-SAT which have been obtained in the frame- 
work of complexity theory may be summarized as follows. 

- For K = 2, 2-SAT belongs to the class P of polynomial problems [19]. P is defined as 
the set of computational problems whose best solving algorithms have running times 
increasing polynomially with the number of relevant variables [6]. For a > a c , MAX- 
2-SAT is NP-complete [6] : NP-complete problems are the hardest nondeterministic 
polynomial problems, whose solutions may be found by the exhaustive inspection of a 
decision tree of logical depth growing in a polynomial way with the number of relevant 
variables; it is generally thought that the running times of their best solving algorithms 
scale exponentially with the number of relevant variables [6] . The mapping of 2-SAT 
on directed graph theory [20] allows to derive rigorously the threshold value a c = 1 
and an explicit 2-SAT polynomial algorithm working for a < a c has been developed 
[19]. 

- For K > 3, both K-SAT and MAX-K-SAT belong to the NP-complete class. Only 
upper and lower bounds on a c (K) are known from a rigorous point of view [18,21,22]. 
Finite size scaling techniques have, recently, allowed to find precise numerical values 
of a c for K = 3,4,5,6 [3]. 

- For K>1, clauses become decoupled and an asymptotic expression a c ~ 2 K \n2 can 
be easily found. It is not yet known whether this scaling law is correct or not from a 
rigorous point of view. 

For brevity, we do not discuss here the results concerning the algorithmic approaches 
to K-SAT and MAX-K-SAT [19,23,24]. We just mention that MAX-K-SAT belongs to 
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the subclass of NP-complete problems which allows for a polynomial approximation scheme 
for quasi-optimal solutions [23]. A recent numerical study of the critical behaviour in the 
computational cost of satisfiability testing can be found in [8]. 

For a = > 0, K-SAT can be cast in the framework of statistical mechanics of random 
diluted systems by the identification of an energy-cost function E(K, a) equal to the number 
of violated clauses [5,16]. The study of its ground state allows to address the optimization 
version of the K-SAT problem as well as to characterize the space of solutions by its typical 
entropy, i.e. the degeneracy of the ground state. The vanishing condition on the ground 
state energy for a given K, corresponds to the existence of a solution to the K-SAT problem 
and thus identifies a critical value a c (K) of a below which random formulas are satisfiable 
with probability one. For a > a c (K), the ground state energy becomes non zero and 
gives information on the maximum number of satisfiable clauses, i.e. on the MAX-K- 
SAT problem. Previous works on the statistical mechanics of combinatorial optimization 
problems - like traveling salesman, graph partitioning or matching problems [1,25,10,11,9] 
- focused mainly on the comparison between the typical cost of optimal configurations and 
the algorithmic results. The issues arising in K-SAT are of different nature, and the key 
quantity to be discussed [21] is rather the typical number of existing solutions, i.e. the 
ground state typical entropy Sk(oi). 

A crucial rigorous result on which the whole statistical mechanics approach is founded 
concerns the self-averageness taking place in MAX-K-SAT. For any K, independently of 
the particular but randomly chosen sample of M clauses, the minimal fraction of violated 
clauses is narrowly peaked around its mean value when N — > oo at fixed a [24]. 

III. STATISTICAL MECHANICS OF THE K-SAT AND MAX-K-SAT COST 

FUNCTION 

As discussed above, we map the random SAT problem onto a diluted spin energy-cost 
function through the introduction of spin variables, Si = 1 if the Boolean variable is true, 
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Si = — 1 if Xi is false. The clauses structure is taken into account by a M x N quenched 
random matrix A where A^j = — 1 (respectively +1) if clause C\ contains Xi (resp. Xi), 
otherwise. Then the function 



M r N 



E[A, S] = J2 5 



(2) 



Li=l 

where S[i;j] denotes the Kronecker symbol, turns out to be equal to the number of violated 
clauses in that the quantity Y,iLi Si equals — K if and only if all Boolean variables in the 
£-th clause take the values opposite to the desired ones, i.e. iff the clause itself is false. The 
above expression can also be written in a way which is manifestly reminiscent of spin-glass 
models (and more precisely neural networks with an extended Hebbian rule [2]), 

E[A,S] = ^N+j:(-l) R £ Ji u i 2 _i R S h Si 2 ■ ■ ■ S iR , (3) 

R=l h<n<...<i R 

where the couplings are defined by 

Y m 

Jiui 2 ,...,i R = 7TR ^2 A^ 2 . . . A itiR . (4) 
z i=i 

In view of the above formulation and of the current knowledge on long-range spin-glasses, 
we may already expect qualitatively different behaviours for K = 2 (similar to Sherrington- 
Kirkpatrick model) and K > 3 (closer to the so-called p-spins or Potts models) [1]. We shall 
see in the following that analytical calculations support this intuitive feeling. 

Finally, to ensure that the number of Boolean variables in any clause is exactly equal to 
K, we impose on A the following constraints 

fX, = K, W = 1, M . (5) 

i=i 

The ground state (GS) properties of the cost function (2) will reflect those of K-SAT 
(E GS = 0) and MAX-K-SAT (E GS > 0). In (2), one may interpret K as the number 
of "neighbours" to which each spin is coupled inside a clause. To study the ground state 
properties fo the cost function (2), we follow the replica approach in the framework of diluted 
models which is indeed much more complicated than that of long-range fully connected 



disordered models. As we shall see below, replica theory must be formulated in a functional 
form involving not only interactions between pairs of replicas but all multi-replicas overlaps. 
To be more precise, we shall use below a new order parameter formulation, inspired from 
[13,14], which results much more convenient than usual overlaps. 

To compute the ground state energy, we first introduce a fictitious temperature 1/(3 to 
regularize all mathematical expressions and send (3 — > oo at the end of the calculation. Note 
that the introduction of a finite temperature also greatly helps to understand the physical 
properties of the model. We proceed by computing the model "free-energy" density at 
inverse temperature (3, averaged over the clauses distribution 

F{I3) = ~WZ\A] , (6) 

where Z[A] is the partition function 

Z[A] = £exp(-/?£[A,S]) . (7) 

{Si} 

As already mentioned, the energy (2) is self-averaging and can therefore be obtained from 
the above free-energy. The overline denotes the average over the random clauses matrices 
satisfying the constraint (5) and is performed using the replica trick \nZ = lim n ^. zn ~ 1 , 
starting from integer values of n. The typical properties of the ground state, i.e. the internal 
energy and the entropy, will then be recovered in the (3 — >■ oo limit. 

Once averaged over the clauses choices, the n th integer moment of the partition function 
depends on the spins only through the multi-overlaps 

1 N 

Qai,a2,—,a,2 T _ S? 1 S? 2 S°" 2r (8) 

involving an even number of replicas. To avoid the introduction of conjugated Lagrange 
parameters, we introduce along the lines of [13,14] a new generating function 

c(*) = ^(l + E £ Q fll ' fla -- fla 'aV...a") , (9) 

\ r—l ai<...<a,2r J 

where a = (a 1 , a 2 , . . . , a n ) spans the space of all 2 n vectors with n binary components 
a a = ±1. The use of this new order parameter lead to simpler algebraic calculations than 
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the usual procedure involving the overlaps (8) and their Lagrange multipliers. Its physical 
interpretation is straightforward : c(a) equals the fraction of sites i (among all possible N 
Boolean variables) such that 5" = <r a , Va = 1, . . . ,n. Therefore, all c(er)'s range from zero 
and one and the global normalization condition implies that 

!>(<?) = i . (io) 

3 

In addition, the vanishing condition on overlaps with an odd number of replica indices reads 

C (B) = c (-3) , V 3 . (11) 

The averaged integer moments of the partition function are then given by the following 
formula 

zw = f n dc (°) eN T[{chK ^ ] , (12) 

where the integration measure is restricted to c(er)'s fulfilling constraints (10,11) and 
T[{c},K,a,0\ = 



c(a) In c(a) + a In 



e ^i)---c(^) n (i + - 1) n i] 

<ri,...,<Tij a=l \ ^=1 



• (13) 



We may interpret the above free-energy functional as the free-energy of a 2 n interacting 
levels system. While the first term in T simply accounts for the statistical entropy, the 
second term represents the interactions between the levels at an effective "temperature" 
l/a. 

In the large N,M limit (with fixed a = M/N), the partition function (12) may be 
evaluated by taking the saddle-point over all order parameters c. Since the function T is 
invariant under permutation of replicas, a possible natural saddle-point can be sought within 
the so called replica symmetric (RS) Ansatz [11,9,13,14] 

c(a\aV..,a") = c(£<V;-l]) , (14) 
which preserves permutation invariance. Constraints (10,11) now read 



n 



3=0 \ J ' 



and 



C(n-j)=C(j) (0<j<n) 



(15) 



We obtain n + 1 saddle-point equations for all C(j)'s by differentiating equation (13) 
with respect to the order parameters. In the n —> limit, we are therefore provided with an 
infinity of order parameters C(j) for any real number j. To reach a simple final expression 
of the order parameters, we now adopt the functional formalism proposed in [11,12]. Let 
us call P(x) the (even) probability distribution of the Boolean magnetizations x = (S), 
averaged over the disorder A. We show in Appendix that 



cu) = £ 



dx P(x) 



x 



1 + X 



(16) 



in the limit n — > 0. The advantage of the above formulation is that P(x) has a clear 
significance, directly comparable to numerical simulations. We shall come back on this 
point in next Sections. 

After some algebra, we find the self-consistent equation for the magnetizations distribu- 
tion P(x) taking into account saddle-point conditions for all C(j)'s, 
1 



P(x) 



X 1 







n - 






/ du cos 






X 








- xj . 





exp 



1 K-l 



-aK + aK / j| dxtP(xi) 
■'~ x i=i 



cos 
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In A 



(K-l) 



(17) 



with 



A lK -i) = A (K ^ ({x t } ,13) = 1 + {e-P - 1) J] 1 {^p-) 



(18) 



The corresponding replica symmetric free-energy reads 

,1 K 



pF(p) = In 2 + a{l - K) f f[ dx t P(x t ) In A {K) + 

■'~ x i=i 

aK f 1 K ~ x If 1 

— / JJ dx t P(x t ) In A iK _ 1} - - dxP(x) ln(l - x ) 

£t J 1 £ ^ £ J 1 



(19) 



Note that in eq.(19) A(k) si given by a formula similar to (18), where the upper bound of 
the product is replaced by K. To end with, let us remark that equation (17) can in turn be 
transformed into an integro-differential equation 
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dP[x) 
da 



! K-l 

-KP{x)+K / | J dx t 

1=1 



P(zi) + a(K- 1) 



<9« 



P(z 2 ) . x 



a " w p(, W) + *^p(„(-,)) 



<9:r 



9a; 



(20) 



where r](:c) = [(x + 1)A(^_!) + x — 1]/[(1 + x)A(^'_ 1 ) + 1 — x] and for which the boundary 
condition is given by the solution of (17) in a = 0: 



(21) 



IV. A TOY MODEL : THE K = 1 CASE 

The K = 1 case can be solved either by a direct combinatorial method or within our 
statistical mechanics approach. Though this particular case does not present any critical 
behaviour, its study will turn out to be useful in understanding the K > 1 models in which 
we are interested. Moreover, the K = 1 toy model allows to check the correctness of the 
statistical mechanics results. 

In this sample of M clauses is completely described by giving directly the num- 

bers ti and fi of clauses imposing that a certain Boolean variable Si must be true or false 
respectively. Therefore the partition function corresponding to a given sample reads 

N 

Z[{t,f}] = U(e-^+e-^) , (22) 

i=l 

and the average over the disorder gives 

= In 2 - ^ + f; e-° J, (a) In (cosh (y ) ) , (23) 

where 1[ denotes the I th modified Bessel function. The zero temperature limit gives the 
ground state energy 

E GS (a) = |[1 - e~ a h(a) - e^I^a)} (24) 
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and the ground state entropy 

S GS {a) = e- a I {a) In 2 . (25) 

One may notice that for any a > 0, the ground-state energy is positive. Therefore, the 
clauses are never satisfiable all together and the overall function (1) is false with probability 
one. Nonetheless, the entropy is finite, implying an exponential degeneracy of the ground- 
state describing the minimum number N -E GS (a) of unsatisfiable clauses. Such a degeneracy 
is due to the presence of a finite fraction of variables e~ a I (a) which are subject to equal 
opposite constraints ti = fa, and whose corresponding spins may be chosen up or down 
indifferently without changing the energy. 

The above results are indeed recovered in our approach, showing that the RS Ansatz is 
exact for all j3 and a when K = 1. Equation (17) can be explicitly solved at any temperature 
l/j3 and the solution reads 

oo / 

p[x) = Y, e ~ a h(a) Six- tanh 

t=-oo V 

which, in the limit of physical interest j5 —> oo, becomes 

P(x) = e- a I (a)S(x) + hl-e- a I (a))(S(x-l) + S(x + l)) . (27) 

At 

The finite value of the ground state entropy may be ascribed to the existence of unfrozen 
spins whose fractional number is simply the weight of the 5-function in x = 0. At the same 
time, it appears that the non zero value of the ground state energy is due to the presence 
of completely frozen spins of magnetizations x = ±1. This is an important feature of the 
problem which remains valid for any K, as we shall see in the following. In Fig. 1 we report 
the plots of the above energy and entropy at zero temperature. 

V. REPLICA SYMMETRIC SOLUTIONS FOR ALL K 

A relevant general mechanism for the comprehension of the overcoming critical behaviour 
in K-SAT is the accumulation of Boolean magnetizations (S) = ±(l — 0(e~^ 13 )), (z = 0(1)), 
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in the limit of zero temperature and for a — > a c . The emergence of Dirac peaks in x = ±1 
signals that a freezing process has just occurred and that a further increase of a beyond a c 
would cause the appearance of unsatisfiable clauses. This scenario - which can be verified 
by inspection of eq. (26) for K = 1 - is true also for K > 1. In fact, by computing the 
fraction of violated clauses through 

1 d 



E = -NT^ m • <28) 

at temperature one sees that the ground state energy depends only upon the magne- 
tizations of order ±(1 — 0{e^ z ^)), if any, and that such contributions can be described by 
the introduction of the new rescaled function 



R{z) = lim 

/3— >oo 



(tanhf— || -^-tanhf— | 
V V 2 J) dz \2) 



(29) 



whose meaning will be clarified in Section IX. ^From (17), R(z) fulfills the saddle-point 
equation 



/oo d u roo K 1 

— cos(iiz) exp — K , + aK / j [ dziR(z£) cos(u min(l, zi, . . . , zk-i)) 
-oo 2tv 2 Jo 



(30) 

The corresponding ground state energy reads, see (19) and (29 ), 

E G s(a) = «(1 - K) j TT dz e R(zi)mm(l, z u . . . , z K ) + 

-— ]] dz i R(z i )mm(l,z 1 ,. . .,z K -i) - dzR(z)z . (31) 
2 Jo ^ Jo 

It is easy to see that the saddle-point equation (30) is in fact a self-consistent identity for 
R(z) in the range z G [0, 1] only. Outside this interval, equation (30) is merely a definition 
of the functional order parameter R. This remark will be useful in the following. 

To start with, R(z) = S(z) is obviously a solution of (30) for all values of a and K, giving 
a zero ground state energy since no spins are frozen with magnetizations ±1. Let us assume 
that R(z) includes another Dirac peak in < z < 1. Then, inserting this distribution in the 
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exponential term on the r.h.s. of (30), we find that R(z) on the l.h.s. necessarily includes all 
Dirac peaks centered in kz , where k = 0, ±1, ±2, ±3, . . .. Next, we proceed iteratively by 
inserting again the whole series in the r.h.s. of (30). For large enough k, kz is larger than 
one and the exponentiated term includes a cos u contribution, which causes the presence of 
Dirac distributions centered in all (positive and negative) integers. Therefore, as soon as 
R(z) is different from S(z), it contains an infinite set of Dirac functions peaked around all 
integer numbers. Clearly, the simplest self-consistent solution to (30) will be obtained for 
z = 1 since the process described above closes after one iteration. This solution reads [5] 

oo 

R(z)= £ e-^ItMSiz-f) , (32) 
e=-oo 

where 71 depends on K and a and fulfills the implicit equation 

K-l 



71 = aK 



-71 



(33) 



The physical meaning of 71 may be understood by looking at the definition of the rescaled 
function order parameter (29). Turning back to the magnetization distribution, we indeed 
find in the zero temperature limit 

P(x) = e-^I ( 7l )P r (x) + e- 7l / (7i)) (S(x - 1) + 6(x + 1)) , (34) 

where P r (x) is a regular (i.e. without Dirac peaks in x = ±1) magnetization distribution 
normalized to unity. The above identity is a straightforward extension of the expression (27) 
(when K = 1, 'ji = a from (33) and P r (x) = S(x)) to any value of K. Inserting eq.(32) in 
(31) gives the value of the cost-energy 

E G s{a) = ^L(l-e-^I Q { ll )-Ke-^h{ ll )) . (35) 

It is therefore clear that, in the RS context, the SAT to UNSAT transition corresponds to 
the emergence of peaks centered in x = ±1 with finite weights, that is to a transition from 
71 = to 71 > 0. This simplest solution centered on integer numbers, similar to previous 
findings [11,12,26], was presented in ref. [5]. 
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In addition to (32), there exist other RS solutions to the saddle-point equations [27]. 
For instance, if we choose z = |, the insertion process ends up after two iterations and 
generates Dirac peaks centered in all integer and half-integer numbers. More generally, for 
any integer p > 1, we may define the solution to (30) 



R(z)= f r t 6(z-i 



(36) 



having exactly p peaks in the interval [0, 1[, whose centers are ze = -, £ = 0, . . . ,p — 1. The 



coefficients ri of these distributions are self-consistently found through 



j o -cos(^)exp(E7,(cos(^)-l) 



ri 



(37) 



for all £ = 0, . . . , p — 1 where 
7j = olK 



1 _ ro 

2 2 



i-i 

E^ 

£=1 



K-1 



1 _ ro 

2 2 



E r ^ 

^=1 



1 _ ro 

2 2 



p-i 



K-l 



j p = olK 

The corresponding energy reads, from (36) and (31), 



(38) 



E, 



GS 



aK 
~2p 



P 



K-1 P-l 



E 

i=i 



K P-l 

+ E 

i=i 
- r 



h" 



E r ' 



En 



^3 



2 



i-i 

E r « 



(39) 



Note that the last term of (39) includes the coefficient r p , which may be computed using 
identity (37). It is easy to check that the first non trivial solution (32) corresponds to p = 1. 
Though there might be continuous solutions to (30), we believe they can be reasonably 
approximated by the large p solutions we have presented here [27]. In the following sections, 
we shall therefore analyze which are the physical implications of the above solutions in the 
different cases of interest, K = 2, K > 3 and K » 1. 
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VI. THE K = 2 CASE 



The case K = 2 is the first relevant instance of K SAT. Graph theory has allowed [20] 
to show that for a = a c (2) = 1 the problem undergoes a satisfiability transition which can 
be also viewed as a P/ NP-complete transition, from 2-SAT to MAX-2-SAT. 

Let us first consider the simplest p = 1 RS solution [5]. Self-consistency equation (33) 
leads to the solution 71 = for any a. However, for a > 1 one finds another solution 
71(a) > which maximizes the free-energy (and Eos) an d therefore must be chosen (this is 
a well known peculiar aspect of the replicas formalism [1]). When approaching the threshold 
from upside, we indeed find 

E GS (a\p = 1) = ^ (a - l) 3 + O ((a - l) 4 ) ~ 0.1481 (a - l) 3 . (40) 

As expected, the p = 1 RS theory predicts Eqs = for a < 1 and Eqs > when a > 1, 
giving back the rigorous result a c (2) = 1 : for a > 1 the fraction of violated clauses becomes 
finite and the corresponding CNF formulas turn out to be false with probability one. The 
transition taking place at a c is of second order with respect to the order parameter ji and 
is accompanied by the progressive appearance of two Dirac peaks for P(x) in x = ±1 with 
equal amplitudes (1 — e _7l / (7i))/2. 

It is straightforward to verify that RS solutions with p > 2 are not present below a = 1. 
However, above the threshold, one has to check whether their ground state energy are larger 
than the one of the p = 1 solution, that is if they can be relevant for MAX-K-SAT. For 
p = 2, resolution of equations (37) and (38) close to a c (2) leads to (discarding the choice 
r\ = which amounts to the p = 1 solution) 

8 + 2^, 



r 



l-^(«-l) + 0((a-l)') 



n = ?-^-(a-l)+0((a-l)*) (41) 

for the coefficients of the Dirac peaks in z = and z = \ respectively. Inserting these 
expansion into the energy (39), one finds 

16 



E GS (a\p = 2) = (« " I) 3 + O {(a - l) 4 ) ~ 0.1496 (a - l) 3 , (42) 

which is slightly larger than the p = 1 result (40). Numerical calculations for higher values 
of p > 3 confirm that the energy increases very slowly with p. We have found that for 
large p's the ground state energy is almost stationary, so that the p = 10 solution can be 
considered as a very fair approximation of the optimal p — > oo RS solution. The coefficients 
ri of the distributions present in the order parameter R(z) (36) are displayed Fig. 2 for 
different values of a and in the cases p = 1, p = 5 and p = 10. 

The ground state energy predicted by the p = 10 RS solution is compared to numerical 
exhaustive simulations carried out for small sized systems on Fig 3. For a > a c = 1, the 
theoretical estimate of E GS seems to sligthly deviate from the numerical findings, which 
signals the occurrence of a Replica Symmetry Breaking (RSB) transition at the threshold. 
This is in agreement with a stability calculation performed on the Viana-Bray model [30] 
around the critical point a = 1 [14]. Note that the Viana-Bray energy is, up to the (irrelevant 
at zero temperature) random field in (3) equivalent to the 2-SAT cost function. We may 
therefore expect that the result derived in [14] apply to our case. If it were so, there would 
be an instability of the replica symmetric saddle-point at the threshold due to replicon-like 
fluctuations, breaking replica symmetry above a c . The situation would be reminiscent of 
the case of neural networks with continuous weights, where RS theory is able to localize 
the storage capacity but not to predict the minimal fraction of errors beyond the transition 
[2,29]. The 1/N extrapolation of the simulations results from finite systems to N — > oo is 
shown Fig. 4 for the particular choice a = 3. Data seem in favor of RSB but one cannot 
exclude that 1/N 2 effects could make coincide both numerics and theory. However, one 
should notice that for a ^> 1, the exact asymptotic scaling of the ground state energy 
Eqs — a/4: [24] is compatible with the RS prediction. 

^From the above discussion, it is reasonable to conclude that RS theory is exact in the 
region < a < 1. As already mentioned, the key quantity to study in this range is the 
typical number of solution to the problem, i.e. the typical ground state entropy Scsi®) 
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given by eq.(19) in the (5 — >■ oo limit. Notice that a simpler expression of the ground state 
entropy, more precisely of its derivative, may be obtained by differentiating (19) with respect 
to a and using the saddle-point equation (20). The result reads 



dS t 



GS 



da 



i K 

•' — i ti 1 



K 

n 



l+Xi 



(43) 



and is valid for any a and K. Using the initial value Scs\a=o = In 2 and the above equation 
(43), one can in principle compute the ground state entropy for any value of a. However, 
due to the difficulty of finding a solution of the integral equation (17), it turns out to be 
convenient to develop a systematic expansion of the entropy in the parameter a. We now 
briefly present the procedure to be employed for a generic value of K. 

Inserting P(x)\ a=0 = S(x) into formula (43), we obtain the slope of the entropy at the 
origin 



dS, 



GS 



da 



In 1 



a=0 



1 

2* 



(44) 



which coincides with the annealed result [3,21]. Then, we use eq.(20) to compute the first 
derivative of the magnetizations distribution in a = 0, 



dP(x) 



da 



-aKS(x) I S | x + 



a=0 



aK 



S ( x 



(45) 



Now, we differentiate eq.(43) with respect to a and inject the above result, which is needed 
to obtain the second derivative of the ground state entropy at a = 0, 



d 2 S, 



GS 



da 2 



-K In 1 



a=0 



1 \ K 2 , 



2 K 



2 n V 1 2^-!(2^-l) 



(46) 



which is negative as required since the entropy is expected to be a concave function of a. 
The whole procedure, consisting in successive differentiations of eqs.(20) and (43) can then 
be iterated to compute symbolically all the derivatives of P(x) and Sgs(&) with respect to 
a in a = 0. 
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In the K = 2 case, we have calculated the power expansion of Sqs (°0 up to the seventh 
order in a (which shows an uncertainty less than one percent with respect to the sixth order 
Taylor expansion on the range a G [0; 1]). The result reads 

S GS (a) = In 2 - 0.28768207 a - 0.01242252 a 2 - 0.0048241588 a 3 - 0.0023958362 a 4 - 

0.0013119155 a 5 - 0.00081617226 a 6 - 0.00053068034 a 7 - ... , (47) 

in which, for simplicity, we have reported only few significant digits of the coefficients. The 
latter are computed symbolically and have the form of a logarithm of rational number. At the 
transition we find Sgs(®c) — 0.38 which is indeed very high as compared to S'gs(O) = In 2. 
A plot of the entropy versus a is shown Fig. 3. For completeness, we stress that the ground 
state entropy and the logarithm of the number of solutions, which coincide below a c , have 
different meanings (and values) above the threshold. In this region, the latter equals to — oo 
since all solutions have disappeared while the former quantity reflects the degeneracy of the 
lowest state (with strictly positive energy) and is continuous at the transition as shown by 
simulations. 

Since, for a > a c , there do not exist anymore sets of S^s such that the energy (2) 
remains nonzero, the vanishing of the exponentially large number of solutions that were 
present below the threshold is surprisingly abrupt. We then conclude that the transition 
itself is due to the appearance, with probability one, of contradictory logical loops in all the 
solutions and not to a progressive disappearance of the number of these solutions down to 
zero. This perfectly agrees with the graph-theoretical derivation of the critical a which is 
indeed based on a probabilistic calculation of appearance of contradictory cycles in oriented 
random graphs representing Boolean formulas. 

VII. THE K > 3 CASE 

The K = 3 case is the first NP-complete instance of K-SAT. The resolution of the 
RS equations leads to a scenario different from the previous K = 2 case. We shall see 
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below that RS theory does not allow to derive the value of the threshold a c (3) ~ 4.2, which 
was estimated by means of finite-size scaling techniques [3]. This is due to the fact that 
the calculation of a c (3) requires the introduction of Replica Symmetry Breaking (RSB), 
leading to very complicated equations we have not yet succeeded in solving. However, it is 
a remarkable fact that, in the relevant region for 3-SAT, i.e. for a ranging from zero up to 
a c (3), the ground state entropy computed using RS theory seems to be exact. 

Let us start with the p = 1 RS solution (32). Solving eq. (33) leads to the following 
scenario (see Fig. 5). For a < a m (3) ~ 4.667, there exists the solution 71 = only. At 
a m (3), a non zero solution 71(a) ^ discontinuously appears. The corresponding ground 
state energy is negative in the range a m (3) < a < a s (3) = 5.181, meaning that the new 
solution is metastable and that Eqs = up to a s (3). For a > a s (3) the 71(a) 7^ solution 
becomes thermodynamically stable [33]. 

From the above scheme one is tempted to conclude that a s (3) corresponds to the desired 
threshold a c (3). However, this prediction is wrong since the experimental value a c (3) ~ 4.2 
is lower than both a m (3) and a s (3). The failure of the above p = 1 RS prediction is also 
confirmed by the large K limit. One finds a m (K) ~ K2 K /16/n and a s (K) ~ K2 K /4/n 
which are larger than the exact asymptotic value a c (K) ~ 2 K In 2. It is worth noticing that 
(similarly to the K = 2 case) though the scaling of a c (K) for large K is wrong within the 
p = 1 RS Ansatz, the asymptotic value for large a (and any K) of the ground state energy 
for MAX-K-SAT is correctly predicted : E GS {a) ~ a/2 K [24]. 

We now turn to improved RS solutions by looking at larger values of p. When p = 2, the 
previous transition scenario remains qualitatively unaltered, but the precise values of the 
spinodal and the threshold points are quantitatively modified. One finds, see Fig. 5, that 
a m(3\p = 2) ~ 4.45 while a s {3\p = 2) ~ 4.82. The ground state energy curve is similar to 
the p = 1 curve but is shifted to the left. Though still incorrect, the p = 2 prediction is 
thus closer to the real threshold value. For larger integers p, we have found that a m (3\p) 
and a s (3\p) still decrease but quickly converge to the values 4.428 and 4.60 respectively (we 
observed a power low convergence by considering values of p up to 30, see Table 1). In 
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Fig. 6, we have plotted the values of the coefficients r> (£ = 0, . . . , p — 1) entering (36) for 
p = 1, p = 5 and p = 10. The departure of the coefficient curves for p = 5 from the p = 10 
curves displaying ro, r2, r4, rg and is clearly visible as soon as the remaining coefficients of 
the p = 10 solution, namely r\,r$,r§, r-j and rg which are implicitly set to zero in the p = 5 
solution, acquire a non negligible value. 

The first order jump of the order parameters jj's has a precise meaning in terms of the 
fraction of Boolean variables completely determined at the transition. We have seen that, 
in 2-SAT, the fraction of Boolean variables whose values cannot fluctuate in the different 
ground states, that is the heights of the Dirac peaks of P(x) in x = ±1, progressively 
increases from zero when a crosses its critical value. For larger K > 3, there abruptly appears 
a finite fraction of the variables which are entirely constrained by the clauses fulfillment 
condition at the threshold. We can compute this critical fraction / using the RS theory. 
From eq.(36), we simply obtain / = 1 — r . The p = 1 solution therefore gives / ~ 0.656. 
Increasing p, the fraction of fixed variables at the threshold converge to / ~ 0.94, see Table 1. 
Such a value is quantitatively consistent with the expected typical entropy (Sqs — 0.03 at 
a = 4.60) which may be easily converted into an upper bound for the fraction of fixed 
variables by the relation Sgs < (1 — /) m 2, leading to / < 0.96. Moreover, numerical 
investigations confirm that a quite large fraction of the Boolean variables have the same value 
(either always true or false) in all satisfying logical assignments at the threshold a c ~ 4.2 
[28]. 

Therefore, we may conclude from the above analysis that RS theory is unable to correctly 
predict the value of the transition threshold but provides us with a sensible qualitative pat- 
tern of the SAT/UNSAT transition. When crossing the latter, a first order replica symmetry 
breaking transition presumably takes place. The calculation of the threshold value would 
require the introduction of a replica symmetry broken Ansatz to replace (14). However, the 
issue of RSB in diluted models is largely an open one [15], due to the complex structure 
of the saddle-point equations involved, and we shall not attempt here at pursuing in this 
direction. 
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In the following, we shall rather show that RS theory still provide a consistent and very 
precise analysis of the behaviour of the random K-SAT problem below its threshold. This 
requires the inspection of the ground state entropy in the region where R(z) = S(z). Using 
the method exposed in the previous Section, we have computed Sgs to the 8 th order in a 
and found that 

S GS (a) = In 2 - 0.13353139 a - 0.00093730474 a 2 - 0.00011458425 a 3 - 
0.000016252451 a 4 - 2.4481877 10~ 6 a 5 - 3.9910735 10~ 7 a 6 - 
6. 5447303 10 8 a 7 - 1. 167915 10~ 8 a 8 - . . . , (48) 

in which, again, we have reported only few sufficient digits of the (exactly known) coefficients. 
The entropy curve is displayed Fig. 7 in the range < a < a c (3). By computing the zero 
entropy points (a ze ) given by the £ — th order entropy expansion, one finds a convergent 
succession of values toward a ze (3) = 4.75 (within one percent of precision), definitely outside 
the range of validity < a < a s (3\p — > oo) ~ 4.60 of the expansion (48). Notice that 
a ze \i=i(3) = 5.1909 corresponds to the annealed theory. A similar calculation for the cases 
K = 4, 5, 6 yields qualitatively similar results which show an even quicker convergence 
towards a zero entropy point such that a c (K) < a ze (K) (see next Section for the analysis 
of the large K limit where both values coincide). 

Therefore, Sgs is always positive below a s (3\p — > oo). In contradistinction with the 
p = 1 RS solution [5], the large p RS solution cannot be ruled out by a simple inspection 
of their corresponding entropy. A more important consequence of the previous calculation 
of the entropy is that, at the threshold a c , the RS entropy is still nonzero. The crucial 
point is now to understand whether such value of the entropy is exact up to a c or whether 
Replica Symmetry Breaking (RSB) effects have come into play. This issue may be clarified 
by resorting to exhaustive numerical simulation. As reported in [5], simulations in the 
range N = 12, 28 lead to the conclusion that not only the entropy is indeed finite at the 
transition but also that our analytical solution appears exact up to a c . In particular the 
1/N extrapolation of the entropy value at a = 4.17 shows a remarkable agreement between 
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the numerical trend and the RS prediction Sqs (a c ) — 0.1 (see the inset of the figure in 
[5]). RSB corrections to the RS theory seem thus to be absent below a c , which leads us to 
conjecture that the RSB transition could occur at a c exactly. In this sense the situation 
would be partially similar to the binary network case [34] : the RS entropy would be exact 
up to a c (though without vanishing) that would also coincide with the symmetry breaking 
point. To end with, let us mention that the existence of an exponential number of solutions 
just below the threshold has been demonstrated [28]. The rigorous lower bound of Sqs is 
S m i n ~ 0.014 (for 3-SAT), which is compatible with our result. 

VIII. THE ASYMPTOTIC CASE OF LARGE K 

In the large K limit, the saddle point equations lead to a closed form for the probability 
distribution P(x). In fact, in terms of the quantity 

-1 K-l 

Q(A) = / J] dx t P(x t )5 (A - , (49) 

J ~ 1 £=1 

the differential equation (20) reads 
dP{x) _ 



da 

dQ(A)\ 1 



-KP(x) + K J°° dA Iq(A) + a- 



(50) 



da J 2 [ dx ■■<•■• g x 
where rj(x) has been defined in Section III. For if > 1 , we may expand Q(A) as 

Q(A) ~ S(A - 1) + ^'(A - 1) + \ Q + \ £ dxP(x)x^ K 1 5" (A - 1) + . . . (51) 

Under the changes of variables G(y, a) = (1 — tanh 2 y)P(tanhy) and 

V{a) = aK (j + \f dxP(x)x 2 ^j , (52) 

equations (50) and (51) simplify into the celebrated heat equation 

dG(y, V) _ 2 d*G(y, V) _ 
dV dy 2 
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whose normalized solution is G(y,V) = exp(—y 2 /2V)/y/2nV. Turning back to P(x), we 



find 



P{x) 



^2W{a){l - x 2 ) 



1 



exp 



8V(a) 



1 



>" 2 (^)) 



(K> 1) 



(54) 



where V(a) is given by the self-consistency equation (52). The latter may be easily estimated 
for large K : V(a) ~ aK/4. K ~ l . Therefore, when a < a c (K) ~ 2 K In 2, V(a>) is vanishingly 
small, that is P(x) — >■ 8(x), proving that the replicas become uncoupled in the large K 
limit [3]. In addition, it can be checked that the zero entropy point a ze (K) reaches the 
threshold a c (K) from above. Another way of looking at the entropy is provided by equation 
(46) : it is a simple check the fact that a c {K) 2 d2 ^^ | a = — >■ for large K. We may then 
conclude that the annealed approximation becomes exact when K ^> 1. As said above, K 
may be understood as the connectivity of our model and, in the asymptotic regime K>1, 
RS theory includes only Gaussian interactions as in long-range spin-glasses models [34]. In 
Fig. 8 we report some instances of the probability distribution, calculated for different values 
of K and a. Notice that since the critical point coincides, in this large K limit, with the 
zero entropy point (which is far below the point where the RS energy becomes positive - 
see previous Section) , the probability distribution of the Boolean magnetization is far from 
being concentrated in ±1. 

IX. ALTERNATIVE DERIVATION OF THE SELF-CONSISTENCY EQUATION 



In this Section, we discuss an alternative heuristic derivation of the self-consistency 
equation (30) for R(z) without resorting to replicas. As a result of this approach, we shall 
unveil the physical meaning of the R(z) functional order parameter and interpret the replica 
symmetry assumption in probabilistic terms. The method we adopt is known as the cavity 
approach [1,26] and here we need to transpose it to the zero temperature case. For the sake 
of simplicity, we shall focus on the 2-SAT case, extensions to higher instances of K-SAT 



FOR R(Z) 
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being straightforward. 

To each Boolean variable Xi and for a given logical formula, we associate a quantity Zi 
defined as follows. We call the difference between the number of unsatisfied clauses C 
when Xi = (false) and when Xi = 1 (true), averaged over the set of all optimal (for K SAT 
or MAX-K-SAT) Boolean assignments, that is ground state configurations. 

Zi = = 0) - = 1) . (55) 

Next, we consider the set of all z^s and define T(z) as their probability distribution after 
having averaged over all possible logical formulas. The calculation of T(z) proceeds according 
to the following four steps. 

Let us consider a given Boolean variable, say x\. (I) For uncorrelated random CNF 
expressions, the probability that neither x\ nor x\ appear in the logical formula is simply 
(1 — -§-) M — e~ 2a . In such case, X\ can be indifferently chosen either true or false, and Z\ = 0. 
Therefore, we obtain a first contribution 

T ( Zl ) = e- 2a S( Zl ) (56) 

to T{z l ). (II) With probability 2a e~ 2a , x 1 will belong to a single clause, e.g. x\ V x 2 . The 
latter is unsatisfied if and only if X\ is false and x 2 is true. Therefore, Z\ = if x 2 is allowed 
to be false (the clause is satisfied independently on xi), i.e. if z 2 < 0. In order to see what 
happens in the average case, let us consider the case where x 2 is true in the majority of 
optimal Boolean assignments. At first sight, z 2 would appear as a strictly positive integer 
since it coincides with a difference of integer numbers [11,12], leading to z\ = 1. However, 
as we consider averaged differences, the Zj may well be rational numbers [27]. Such a 
counter intuitive behavior can be easily understood with the following simple argument. If 
x 2 appears (in average) in less than one clause, that is if < z 2 < 1, it cannot be present in 
another clause and we must have Z\ = z 2 . Conversely, if z 2 > 1, x 2 is more frozen than x\ 
and z\ saturates its upper bound equal to one. Notice that this result may be made rigorous 
by working at finite temperature [26] . To complete the probabilistic analysis of this second 
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contribution T\(zi) to T(zi), we have to take into account the three other possible clauses 
involving X\ and x 2 , and collect the corresponding contributions. We find 

POO (\ \ ^| 

T x {z x ) = 2a e~ 2a J dz 2 T(z 2 ) j-^ - min(l, z 2 )) + -S( Zl + min(l, z 2 )) + 8{zM . 

(57) 

(III) By iterating the above reasoning, we consider a logical formula such that x\ belongs 
exactly to j clauses. The probability of such an event obeys the Poisson law (2aye~ a /j\. 
Almost surely, the variables x 2 ,x 3 , . . . , Xj + i appearing in these j clauses are different from 
each other. Moreover, in the large N limit, any pair of variables x m and x n (2 < m, n < 
j + 1) are always at a large "distance" from one another, where the relative distance is 
defined as the minimal number of logical links (clauses) joining x m to x n , see ref. [26]. As 
a consequence, the joint probability distribution of z 2 , z 3 , . . . , Zj + \ factorizes and due to the 
statistical independence of the choices of the clauses, we have 

(2aV rooi+ l i 1 / m \ 

Tj(*i) = V" ^ / II dz t T(z t ) E E ^ E SU-Y,°1 min(l,^) , 

(58) 

where the a/s run between 2 and j + 1. (IV) Summing the previous expressions for all 
values of j, we recover eq.(30) with R(z) = T(z). 

Of course, the self-consistency equation for R(z) is correct provided that replica sym- 
metry is valid, while T(z) is defined independently from any replica calculation. Therefore 
the equality between the two quantities cannot hold in general and is due to the assumption 
on the absence of correlations between different zi we have made above [1,26]. This is the 
probabilistic meaning of replica symmetry. 

X. CONCLUSION AND PERSPECTIVES 

In this paper, we have presented the replica symmetric theory of the random K-SAT 
problem. We have shown that the natural quantity emerging from the analytical study is 
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the distribution of the average values of the Boolean variables, indicating to what extent 
the latters are determined by the constraints imposed by the clauses. The knowledge of this 
probability distribution requires the resolution of a functional saddle-point equation, for 
which we have presented an iterative sequence of improved solutions. The most surprising 
result we have derived is the fact that the entropy is finite just below the transition, i.e. 
that the latter is characterized by an abrupt disappearance of all exponentially numerous 
solutions due to the emergence of contradictory loops. 

Some numerical simulations we have performed for K = 2 as well as in the K = 3 case 
are in remarkable quantitative agreement with our RS calculations of the entropy jump at 
the threshold [5]. Both the known results on the stability of 2-SAT like models and the 
numerical simulations, hint at the correctness of the RS theory up to the critical ratio of 
clauses per Boolean variable. 

Would it be so the physical picture of the space of solutions would not necessarily be 
simple. Replica symmetry can indeed hide a non trivial structure of the solutions, as has been 
shown for long range spin-glasses [35] models and in the (closer to K-SAT) case of neural 
networks [36]. This issue is probably of crucial importance to understand the performances 
of local search algorithms. 

As for the values of the critical thresholds themselves, RS gives the correct prediction 
a c = 1 for K = 2 but fails in estimating the critical a c for K > 3. The study of the 
(hard) instances K > 3 of the K-SAT problem requires to break replica symmetry. As a 
consequence, their direct study will not be easy and will require non trivial analytical efforts. 

Another route which one can follow to reach a better understanding of the K > 2 
case consists in starting from the relatively well understood 2-SAT case and modifying it 
to get closer to the 3-SAT problem. Such a perturbative approach can be implemented 
by considering a mixed model, which one may refer to as (2 + e)-SAT model (e e [0, 1]), 
composed of (1 — e)M clauses of length three and eM clauses of length two (thus interpolating 
smoothly between the Polynomial 2-SAT and the NP-complete 3-SAT models). Analytical 
investigations suggest that the threshold can be computed exactly up to e = e = 0.413. 
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For e < eo, one finds a continuous SAT/UNSAT transition at a c (e) = 1/(1 — e). The model 
shares the same physical features as the random 2-SAT model. For e > e , the SAT/UNSAT 
transition becomes a discontinuous (with respect to the order parameters) RSB transition 
similarly to the 3-SAT model. Preliminary numerical results suggest that the above model 
can be of interest for exploring the connection between the nature of the RS to RSB phase 
transition and the onset of exponential regimes in search algorithms running on samples 
generated near criticality [31]. 

Acknowledgments : We thank O. Dubois, S. Kirkpatrick and B. Selman for useful 
discussions. 

APPENDIX A: RELATIONSHIPS BETWEEN ORDER PARAMETERS 

Identity (9) implicitly implies that one can make a change of variables from overlaps Q 
to the generating function c. Let us call M. the linear operator 

M({a 1 ,a 2 ,...,a 2p };B) = a ai a a2 . . . a a " . (Al) 

For simplicity, we set = 1 and all overlaps with an odd number of replicas are null. The 
dimension of M. is therefore equal to 2 n . To any sequence {a\, a2, ... , a n }, we associate a 
n-component vector r such that r b = — 1 if b belongs to the sequence and r b = 1 otherwise. 
From definition (Al), we obtain 

M(r;a) = f[ \(1 + a a + r a - a a r a ) . (A2) 

As a consequence, M. equals the n ih power (for tensor product) of a two by two matrix. The 
Jacobian of the change of variables is found to be 

\M\ = (-2) n2n ~ 1 (A3) 

and is different from zero. We may invert Ai and find 

1 / n/2 \ 

c(a) = — 1 + Y, <2 ai ' a2 '-' a2 * <j ai a a2 . . . a a2 " . (A4) 

^ \ p—lai<a2<---<a2 p J 
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Let us now turn to the replica symmetric Ansatz structure. From definition (14) and 
identity (A4), we obtain 

C(J) = ^\1 + EQ P E ^V'.-.a-H , (A5) 

\ p—l ai<a2<...<a 2p J 

where 2j = n — YZ=\ ° a an d the replica symmetric overlaps Q p are calculated from the 
magnetizations distribution [11,12] 

Q p = J 1 dx P{x) x 2p . (A6) 

To establish the relationship between the C(j)'s and the distribution P(x), we have to 

expand the sum over replicas taking place in (A5) onto the powers of the a magnetization 

v / n \ 2r 

Y: a^a^...a^ = J2H^ ^ a a ) . (A7) 

ai<a2<...<a 2p r=0 \a=l / 

The matrix can be computed by first finding the generating function of [H^] -1 and 
then inverting the latter. We finally find 



1 r) 2p / i \ n 



(A8) 

y=0 



Using the above expression and inserting eq.(A6) into (A5), one recovers identity (16) in the 
limit n — >■ 0. 
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TABLES 



V 




/ 


1 


5.1812 





6561 


3 


4.7271 





7889 


6 


4.6451 





8497 


9 


4.6240 





8765 


12 


4.6153 





8920 


15 


4.6107 





9022 


18 


4.6080 





9095 


21 


4.6063 





9150 


24 


4.6051 





9193 


27 


4.6042 





9227 


30 


4.6036 





9256 



TABLE I. p dependence of the RS critical ratio ay and of the fraction / of fixed variables. 



33 



FIGURES 

FIG. 1. Ground state cost (bold line), or fraction of violated clauses, and entropy (thin line) 
versus a = M/N for K = 1. 

FIG. 2. Order parameters ri (i = 0, ...,p — 1) corresponding to the different RS solutions 
p = 1 (dashed line), p = 5 (dashed-dotted lines) and p = 10 (continuous lines), for If = 2 and 
a = M/N E [1,3]. The upper curve within each group represent r$ whereas the overlapping ones 
in the lower part of the figure represent for i = l...,p — 1 (p = 5, 10). 

FIG. 3. RS ground state entropy (decreasing curve, left scale) and RS ground state cost (in- 
creasing overlapping curves computed for p = 1, 10, right scale) versus a = M/N for K = 2. 
At a = a c = 1 the ground state cost becomes positive, signaling a second order SAT/UNSAT 
transition (at the same point the RS solution becomes unstable). The value of the entropy at the 
critical ratio is 0.38. The dashed lines interpolate the numerical data of exhaustive simulations on 
systems of size N = 16, 20, 24 and averaged over 15000, 7500, 2500 samples respectively. Errors bars 
are within 10% for the entropy and even smaller for the energy and thus not reported explicitly. 

FIG. 4. 1/N extrapolation of the minimal fraction of violated clauses (i.e. ground state cost) 
for a = 3 and N = 18, 20, 22, 24, 26 averaged over 20000, 15000, 10000, 7500 and 5000 samples 
respectively. The extrapolated value appears to be different from the value 0.14472 toward which 
the RS solutions with increasing p rapidly converge. This is in agreement with the expected 
instability of the RS solutions for a > a c . 
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FIG. 5. RS ground state energy for K = 3 (continuous lines) computed for p = 1, 10 (lines 
corresponding to larger values of p would not be distinguishable) and compared with the results 
of numerical simulations on systems of size N = 16, 20, 24 and averaged over 15000, 7500, 2500 
samples respectively (error bars are of the order of the size of the dots). The RS ground state 
energy becomes positive (for p » 1) at a s ~ 4.60 whereas the value at which the unstable 
solution appears is a m ~ 4.428. Both values are grater than the numerical estimate of the critical 
ratio (4.2). Scope of the dashed line is to help the eye in following the expected, yet unkown, RSB 
behaviour of the ground state energy. 

FIG. 6. Order parameters ri (i = 0, ...,p — 1) corresponding to the different RS solutions p = 1 
(dashed line), p = 5 (dashed-dotted lines) and p = 10 (continuous lines), for K = 3 versus 
a = M/N. Within each group of p = 1, 5, 10 curves, the upper one represent ro whereas the others 
represent ri (i = l...,p — 1), in top-down order. 

FIG. 7. RS entropy (continuous line) for K = 3 versus a = M/N compared with the results of 
exhaustive numerical simulations for N = 16, 20, 24 and averaged over 15000, 7500, 2500 samples 
respectively (see also ref.[4]). Errors bars are within 10% and not reported explicitly. 

FIG. 8. Probability distributions P(x) as functions of the magnetization x, calculated for 
a = 2 K In 2 (critical threshold in the K » 1 limit) and for K = 10, 12, 14, 16, 18. 
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